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Abstract 

Background: Anatomy ontologies are query-able classifications of anatomical structures. They provide a widely-used 
means for standardising the annotation of phenotypes and expression in both human-readable and 
programmatically accessible forms. They are also frequently used to group annotations in biologically meaningful 
ways. Accurate annotation requires clear textual definitions for terms, ideally accompanied by images. Accurate 
grouping and fruitful programmatic usage requires high-quality formal definitions that can be used to automate 
classification and check for errors. The Drosophila anatomy ontology (DAO) consists of over 8000 classes with broad 
coverage of Drosophila anatomy. It has been used extensively for annotation by a range of resources, but until 
recently it was poorly formalised and had few textual definitions. 

Results: We have transformed the DAO into an ontology rich in formal and textual definitions in which the majority 
of classifications are automated and extensive error checking ensures quality. Here we present an overview of the 
content of the DAO, the patterns used in its formalisation, and the various uses it has been put to. 

Conclusions: As a result of the work described here, the DAO provides a high-quality, queryable reference for the 
wild-type anatomy of Drosophila melanogaster and a set of terms to annotate data related to that anatomy. Extensive, 
well referenced textual definitions make it both a reliable and useful reference and ensure accurate use in annotation. 
Wide use of formal axioms allows a large proportion of classification to be automated and the use of consistency 
checking to eliminate errors. This increased formalisation has resulted in significant improvements to the 
completeness and accuracy of classification. The broad use of both formal and informal definitions make further 
development of the ontology sustainable and scalable. The patterns of formalisation used in the DAO are likely to be 
useful to developers of other anatomy ontologies. 
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Background 

Anatomy ontologies 

Anatomy ontologies are queryable classifications of 
anatomical structures. They are commonly used by bioin- 
formatics resources to provide controlled vocabularies for 
annotating a range of entities (such as research papers, 
genes and genotypes). Typically curation is done manually 
and consists of assertions about phenotypes and expres- 
sion patterns [1-4] but many other types of assertion are 
possible. For manual annotation, class and part hierar- 
chies in ontologies provide terms with a range of speci- 
ficity allowing curators to choose an appropriately precise 
term depending on the information available. Term names 
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alone are frequently ambiguous. Textual definitions of 
terms, ideally supplemented with images, are therefore 
important for consistent and accurate manual annotation. 

Anatomy ontologies are also used to group annotations 
in biologically meaningful ways. This is commonly done 
by grouping annotations using class and part hierarchies 
(partonomy). For example, a query for genes expressed in 
the Drosophila leg would return gene expression anno- 
tated with the term middle leg (a subclass of leg) and 
claw (a part of the leg) as well as with the term leg. The 
usefulness of such grouping depends on the accuracy of 
classification and of assertions about partonomy. More 
sophisticated groupings can be achieved by taking advan- 
tage of ontology semantics expressed in a formal language 
such as OWL. For example. Virtual Fly Brain ( VFB) groups 
annotations based on inference of overlap between neu- 
rons and gross neuro-anatomical structures as well as 
using partonomy and classification [5,6]. 
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Various resources, including model organism databases, 
use anatomy ontologies as searchable stores of informa- 
tion about anatomy. Annotation of terms with synonyms 
provides a means for users to search for anatomical 
structures using the various names used for them in 
the literature. Textual definitions provide human read- 
able information about anatomy and links to references 
and in some cases, images. Formal relationships pro- 
vide a means to browse to related terms as well as for 
grouping annotations. Formalisation of anatomy ontolo- 
gies in OWL also has great potential as a source of 
queryable information about anatomy. For example, VFB 
uses OWL queries to provide answers to user queries 
about neuronal connectivity in the Drosophila nervous 
system. 

Anatomy ontologies and formal ontology languages 

OWL 2 Web Ontology Language (OWL2) [7] is a 
W3C recommended, description-logic based ontology 
language. Its rigorous definition, web integration and the 
wide availability of fast reasoners make it a very attrac- 
tive language to use for ontology building. The EL pro- 
file of OWL2 [8] is particularly attractive as reasoning 
times scale very well with increasing size and complex- 
ity and new, fast reasoners that take advantage of this are 
available [9]. 

Most widely used anatomy ontologies, or ontologies 
with major anatomical components such as Snomed- 
CT (http://www.ihtsdo.org/snomed-ct), were developed 
prior to the publication of the OWL2 specification and 
so use different formalisms. One of the most com- 
monly used ontology languages apart from OWL is Open 
Biomedical Ontologies format (OBO format). Historically, 
OBO format ontologies have been manually maintained 
and only weakly formalised compared to what is pos- 
sible in OWL2. Improvements to the expressiveness of 
OBO format and the definition of OBO format seman- 
tics via mapping to OWL2 [10,11] have made it pos- 
sible to formalise definitions so that OWL2 reasoners 
can be used to automate classification, check for consis- 
tency and run queries. Referencing terms from external 
ontologies in formal definitions makes cross ontology 
querying possible and, when combined with modular- 
isation strategies, allows auto-classification to leverage 
the semantics of other ontologies. By using OWLtools 
(https://code.google.eom/p/owltools/), it possible to do 
this while keeping the master version of an ontology in 
OBO 1.4. 

This approach is already being used to improve the 
Gene Ontology (GO) [12,13], a number of phenotype 
ontologies [14-16], the Cell Ontology (CL) [17] and is cen- 
tral to construction of the multi-species anatomy ontology 
Uberon [18]. The expressiveness of 0B01.4 is almost 
entirely within the EL profile of OWL2, meaning that, 



with a few precautions, it is possible to take advantage 
of the new generation of fast EL reasoners when working 
with OWL2 translations of OBO ontologies. 

A brief history of the Drosophila anatomy ontology 

The DAO has its origins in a semi-structured, con- 
trolled vocabulary developed by Michael Ashburner for 
use in annotation by FlyBase over 20 years ago. An early 
draft (https://sourceforge.net/p/fbbtdv/code/HEAD/tree/ 
fbbt/releases/prehistory/proto- FBbt-1992.txt) had over 
2500 terms arranged in a single inheritance hierarchy with 
76 references, each attached to a mid-level node in the 
hierarchy. It had no named relations or textual definitions, 
but its simple structure made it easy for users and edi- 
tors to follow. Following its initial drafting, the ontology 
grew largely by addition of terms requested by curators 
and gradually became more formal with the adoption of 
standards developed for the GO, including OBO format. 
With increased formalisation came the use of explicit clas- 
sification extending to multiple axes (each term could 
have more than one parent class), and the use of named, 
although undefined, relations for partonomy and devel- 
opment. During this growth, very few textual definitions 
were added. 

This path of development eventually became unsustain- 
able. By 2006, the ontology had over 6000 classes, only 
4% of which were defined. Ontologies with multiple axes 
of classification are extremely hard to maintain by hand 
beyond a certain size [19] . Editors wanting to add a new 
class will have to create all the appropriate classifications 
manually, but in a large and growing ontology it can be 
very difficult to ascertain what classifications are available. 
Even where this is clear, it can be very difficult to judge 
the most appropriate place in each classification hierarchy 
to place the new class. This problem grows more diffi- 
cult as the ontology grows, as more axes of classification 
are added and is compounded by the arrival of new edi- 
tors lacking the tacit knowledge of those who created the 
ontology. In the absence of textual definitions, the tacit 
knowledge required for maintenance includes the mean- 
ings of the terms themselves and the reasons for existing 
classifications. For these reasons, the DAO classification 
hierarchy had accumulated gaps, redundancies, errors and 
duplications. (See Figure 1 for an example of gaps in the 
classification hierarchy). 

A much more sustainable approach is to formally spec- 
ify the properties of classes, where possible specifying a 
set of necessary and sufficient conditions for class mem- 
bership. Standard OWL reasoning software can then be 
used to automate classification. Making the conditions 
for class membership explicit using OWL axioms also 
makes it possible to query the ontology for classes with 
particular combinations of properties. When combined 
with declarations of disjointness between classes (e.g. 



Costa etal. Journal of Biomedicol Semantics 201 3, 4:32 
http://www.jbionnedsenn.conn/content/4/l/32 



Page 3 of 1 1 




sense organ 



sensory organ cell 



adult sense organ chemosensory sensory organ 




external sensory organ 



head sensillum 





mechano-chemo-sensory 




mechanosensory labral 


) 


labral sensillum 




sensillum 



rmechano-chemo-sensofyl 
I labral sensillum 8 | 



is_a (SubClassOf) 



— asserted 



B 



anatomical structure 



cell cluster organ 



sense 




organ ] 



^^^BBorga^^ sensillum ] [ external sensory organ 

r adult sensillum j f head sensillum f external i 



mechanosensory sensory 
organ 



chemosensory sensory organ 

Tf 



adult sensillum 



head sensillum 



external sensillum 



[ adult headsensillum 



eo-type sensillum 



mechanosensory labral 

sensillum 



mechano-chemo-sensory 
labral sensillum 

^ 



mechano-chemo-sensory 
labral sensillum 8 



— inferred 

Figure 1 Before and after refactoring. An example of incomplete classification, fixed by refactoring. (A) shows the classification of 
'mechano-chemosensory labral sensillum 8' prior to refactoring. Many valid classifications are missing. Note also the erroneous classification of 
'external sensory organ' as a type of 'sensory organ cell'. (B) shows autoclassification of the same class after refactoring. Terms with equivalent class 
definitions are shown in green. 



nothing can be both a muscle cell and a neuron) and 
other formaUsms, a reasoner can also be used to detect 
errors. 

We have transformed the DAO into a richly for- 
malised ontology in OWL in which the majority of terms 
have textual definitions and much of the classification 



is automated. Here we describe its content, both formal 
and informal, its interconnection with other ontologies 
and its usage. Links for downloading various versions 
of the DAO and accessing documentation are provided 
in Table 1. The work described in this paper references 
release 2013-07-26 (see Table 1). 
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Table 1 Accessing the D AO 



Target 


Base URL extension 


Homepage 


fbbt 


Term request tracker 


fbbt/tracker 


Prp_rp;3cnnpH ORO \/prf;inn 
r 1 c 1 ca jv^i icvj w u w vci jiv^i i, 


fhht/fhht-<;irnnlp nhn 


no imports 




Full OWL version with 


fbbt/fbbt-non- 


imports 


classified.owl 


Version described in this 


fbbt/releases/20 13-07-26/ 


paper 




Full download guide 


fbbt/downloads 


Individual term details for 


FBbt_0000423 


FBbt_0000423 - Resolves 




programmatically to XML 





The DAO is openly available under a Creative Commons Attribution licence. 
DAO files, documentation and related resources can all be accessed via 
Persistent URLs (PURLs). The base URL for all PURLS is http://purl.obolibrary.org/ 
obo/. Column 1 describes the targets of various PURLs specified in Column 2 as 
extensions to this base URL. The two columns should be combined without any 
additional spaces, e.g. http://purl.obolibrary.org/obo/fbbt/fbbt-simple.obo. We 
recommend opening OWL versions in Protege 4.3 and classifying the full OWL 
version using the ELK reasoner [9]. The files in dated release folders follow the 
same naming convention as the latest versions. 

Results 

Criteria for inclusion in the DAO 

One of the main aims of the DAO is to provide a 
queryable reference source for information about wild- 
type Drosophila anatomy. For this reason, we only include 
named classes where there is good scientific evidence for 
the presence in wild-type animals of structures with the 
properties described in both formal and informal compo- 
nents of the class definition^. Access to this evidence is 
provided by links to the relevant literature and sometimes 
in the form of free text summaries provided as a comment 
separate from the class definition (see the definition of 
sound activated Johnston organ neuron below). In order to 
conform to this inclusion criterion, we have occasionally 
obsoleted mistakenly added classes that refer to structures 
not present in wild-type Drosophila. 

Informal definitions 

We have added well-referenced, textual definitions to over 
73% of classes. This work was co-ordinated with for- 
malisation and, in many cases, included the addition of 
comments containing a brief description of the evidence 
for assertions made in both formal an informal compo- 
nents of the definition. Formal assertions of properties 
are supported by references attached to the informal def- 
inition, allowing users and editors to quickly access the 
literature to judge for themselves whether the assertions 
are justified. 

Textual definitions in the DAO follow approximately 
an Aristotelian genus and differentia pattern. The defi- 
nitions first state a general classification or genus (e.g. 



A neuron that...') and then refine this with an account 
of the properties (differentia) that make members of this 
class different from others with the same classification. 
We optionally supplement this with a brief account of key 
properties that do not apply to all members where we con- 
sider such information useful to our users. References are 
included within the text of definitions, following typical 
academic style, as well as in the form of a list of identi- 
fiers to be used for rolling a hyperlinked bibliography for 
display. 
For example: 

name: sound activated Johnston organ neuron 
definition: A Johnston organ neuron (JON) that is 
activated by near-field sound ranging from 19 Hz to 
952 Hz, maximally at 90 dB (Kamikouchi et al, 2009; 
Yorozu et al, 2009). These neurons are transiently 
(phasically) activated by the onset and offset of arista 
displacement. Cells preferentially activated by 
low-frequency vibration are loosely distributed as a 
ring in the middle layer of JON cell bodies. Higher 
frequencies preferentially activate JON neurons with 
cell bodies located mainly in the inner layer, directly 
surrounding the antennal nerve (Kamikouchi et al., 
2006). 

comment: Response to sound and arista 
displacement has been determined 
electrophysiologically and by using the calcium 
sensor GCaMP (Yorozu et al, 2009, Kamikouchi 
et al., 2009). 

Wherever possible, we use consistent patterns to define 
groups of similar classes, such as neuron classes defined 
by lineage and innervation pattern, or muscle classes 
defined by their location, origin and insertion. 

Synonyms and disambiguation 

Anatomical terminology varies between different research 
groups and over time. Drosophila anatomy has a long his- 
tory, giving ample time for the development of varied and 
sometimes conflicting usage of terminology. The DAO 
would be of limited usage if it did not reflect this: there 
is no way to know a priori what terminology users will 
be familiar with; text miners need to be able to match 
the variety of terms they encounter in the literature with 
appropriate ontology terms. 

To support textual searching and text mining, we have 
annotated many DAO terms with multiple synonyms. 
Wherever possible, these synonyms are linked to papers 
where they originate or that provide examples of their 
usage. For example, for the entire larval musculature, we 
have added referenced synonyms reflecting the two major 
nomenclatures currently in use [20,21] and a number of 
variants on them. 
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Where there are conflicting uses of terminology in the 
literature, we note this with a disambiguation comment. 
For example: 

name: ovariole 

comment: The term ovariole is sometimes 
incorrectly used to refer to individual egg chambers. 
Please use the term egg chamber' for this. 

Refactoring the DAO 

In order to improve the accuracy and completeness 
of classification in the DAO, we have refactored it to 
reduce asserted multiple inheritance classification. We 
have achieved this by adding formal specification of the 
properties of classes and then using this specification to 
infer a large proportion of the classification hierarchy. 
This work has been co-ordinated with the addition of 
textual definitions and references, with the references pro- 
viding a link to the evidence for assertions in both formal 
and informal components of the definition. At the same 
time, we have added declarations of disjointness in order 
to provide basic error checking. 

Fundamental declarations of disjointness hold between 
high-level anatomical classes. Adding these provides basic 
sanity checking for the ontology. For instance, noth- 
ing can be both a cell and a multicellular structure, or 
be both multicellular and acellular. As well as provid- 
ing basic sanity checks, such fundamental declarations of 
disjointness are useful for detecting mistakes much fur- 
ther down the hierarchy. An error picked up early in 
refactoring was the misclassification of external sensory 
organ (FBbt_00005168), a class of multicellular struc- 
ture, as a type of 'sensory organ cell' (FBbt_00005163) 
(Figure 1). 

Such checks can only work if there is sufficient classifi- 
cation in place. Prior to refactoring, only 240/6024 DAO 
terms were classified to root and 933 had no classifica- 
tion at all. In the current ontology all terms are classified 
using terms mapped to the Common Anatomy Reference 
Ontology (CARO) [22] or extensions to it (https://code. 
google.eom/p/caro2/). CARO classes are treated as pair- 
wise disjoint. Cells are all classified, via a separately main- 
tained file of bridging axioms, using the Cell Ontology 
(CL) [17]. These basic classifications are a pre-requisite 
for much of the consistency checking and automated clas- 
sification we have implemented and are also the basis for 
cross integration of anatomy ontologies by projects such 
as Uberon [18]. 

Rather than applying a rigid insistence on a single axis 
of classification, we have adopted a pragmatic approach, 
targeting easily formalisable classifications and tolerating 
dual inheritance where formalisations required to remove 
it were not obvious. Except for a few general class axioms 
used to specify disjointness, formalisation is restricted to 



the expressiveness of OBO format 1.4 [11] Much of the 
formalisation required new relations, which we developed 
in co-ordination with the OBO relations ontology and 
the cell ontology. 

Patterns of formalisation 

The main axes of classification we targeted for formal- 
isation were partonomy and function. There are ample 
opportunities for leveraging the hierarchy of part_of 
(BFO_0000050) relationships to automate classification. 
For example we can formally specify the class larval sen- 
sillum (FBbt_00002782) as: 

'larval abdominal sensillum' EquivalentTo sensillum 
that part_of some 'larval abdomen' 

We specify function using capable_of (RO_0002215) 
[17] or capable_of_part_of (RO_0002216) relationships 
to terms from the biological process branch of the GO. 
These two relations are linked by the property chain: 

capable_of o part_of SubPropertyOf 
capable_of_part_of 

This can be read as stating the rule: If X capable_of Y 
and Y part_of Z then X capable_of_part_of Z. With this 
in place, we can leverage both the part and class hierar- 
chies in the GO to structure the DAO (see Figure 2 for 
examples). 

Much of the formalisation of the DAO follows well- 
documented design patterns (http://purl.obolibrary.org/ 
obo/fbbt/doc/odp). Nested class expressions are not per- 
mitted in OBO 1.4, so design patterns typically refer to 
simple subject, relation, object relationship patterns with 
existential quantification (subject relation some object). 
DAO design patterns have two components: (a) A spec- 
ification of how to record a particular property (such as 
sensory modaUty) for some specified (subject) class of 
structure using a particular combination of relation and 
object class; (b) A set of high level classes defined using 
EquivalentClass axioms that automate classification, and 
in some cases partonomy, for classes that use the pattern 
defined in (a). 

The formal definition and classification of nervous sys- 
tem components by their sensory modality provides a 
good example of this (see Figure 2). The GO biologi- 
cal process term detection of stimulus involved in sensory 
perception (GO_0050906) has a set of subclasses that 
are differentiated by the physical nature of the stimu- 
lus. Figure 2 A shows a portion of this class hierarchy. A 
similar set of terms define the process of sensory percep- 
tion (GO_0007600) with each detection of stimulus class 
standing in a part_of relationship to a sensory perception 
class defined by the same stimulus type (three such pairs 
are shown in Figure 2C). 
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Figure 2 Auto-classification of sensory modality. (A) Classification under 'detection of stimulus involved in sensory perception'. (B) Inferred 
classification of sensory neuron classes with sensory modality defined using the pattern: EquivolentTo neuron that capable_of some 'detection of 
stimulus involved in sensory perception' or one of its subclasses. (C) Classification and part relationships between subclasses of 'sensory perception' 
and subclasses of 'detection of stimulus involved in sensory perception'. (D) Populating the auditory system. An auditory system neuron is defined 
as "any neuron that is capable_of_part_of {some) 'sensory perception of sound'". This property is directly asserted for 'inferior ventrolateral 
protocerebrum IVLP-IVLP neuron', and inferred for 'auditory sensory neuron', which is defined as "neuron that capable_of {some) 'detection of 
mechanical stimulus involved in sensory perception of sound'". Inference comes from the part relation between the two GO terms and a property 
chain stating that if X capable_of Yand Y part_of Zthen Z capable_of_part_of Z. All auditory system neurons are asserted to be part_of {some) 
'auditory system', an assertion that is inherited by all classes inferred to be subclasses of 'auditory system neuron'. 



The design pattern for specifying the sensory modality 
of sensory neurons is simply: 

SubclassOf <neuron or one of its subclasses> that 
capable_of some <'detection of stimulus involved in 
sensory perception or one of its subclasses> 

This is used as part of the formal definition of 
sound activated Johnston organ neuron (FBbt_00100002) 
(Figure 2B,D). A set of high level classes for sensory neu- 
rons, differentiated according to sensory modality, are 
defined for each 'detection of stimulus' class, following the 
pattern. 

sensory neuron EquivalentTo neuron that capable_of 
some <'detection of stimulus involved in sensory 
perception or one of its subclasses> 



A reasoner can then automatically classify neurons 
whose definition follows the design pattern under the 
appropriate general class. For example. Figure 2B shows 
the inferred classification of sound activated Johnston 
organ neuron. 

The design pattern for asserting a downstream function 
for a neuron in sensory perception is: 

SubclassOf <neuron or one of its subclasses> that 
capable_of_part_of < 'detection of stimulus involved in 
sensory perception or one of its subclasses> 

This is used as part of the formal definition of in- 
ferior ventrolateral protocerebrum IVLP-VLP neuron 
(FBbt_001 10126) (Figure 2D). For each sensory system, we 
define a general class using this pattern, and assert that 
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all members of this class are part of the relevant sensory 
system. For example, for the auditory system we define: 

'auditory system neuron 

EquivalentTo neuron that capable_of_part_of some 
'sensory perception of sound' 

SubclassOf part_of some auditory system'^ 

Figure 2D shows the automated classification that 
results from these formalisations, including automated 
population of the partonomy of the auditory system (see 
figure legend for details). Over 1400 classes of neuron and 
sense organ in DAO are automatically classified accord- 
ing to their sensory modality using this pattern. Many 
other design patterns are used in the DAO. These include 
patterns for representing neuro-anatomy [6], classifying 
larval musculature and classifying larval trachea. In total, 
almost 50% of the > 10,000 classifications in the current, 
pre-reasoned versions of the DAO originate as inferred 
classifications. 

Using images to define ontology terms 

Anatomy is an intensely visual subject. Anatomical images 
can communicate the meaning of anatomical terms much 
more rapidly and efficiently than text and can be irre- 
placeable as a means of communicating the position of 
boundaries. Textual definitions can be enhanced by pro- 
viding example images and schematic drawings - either 
informally annotated (FlyBase provides over 1100 of these 
for the DAO), or annotated as OWL individuals (VFB 
provides over 17000 3D images of neurons and neural 
clones, annotated with OWL axioms referencing DAO 
classes). But it can also be useful to explicitly define an 
anatomical class in relation to a standard reference image. 
This is particularly useful where there is a standard 3D 
reference image available that defines the boundaries of 
regions and a standard co-ordinate space onto which mul- 
tiple images can be mapped via registration (warping) 
algorithms. There are two such standards available for the 
adult Drosophila brain, FlyCircuit [23] and BrainName^. 
We have defined OWL individuals corresponding to each 
of the painted brain regions in the BrainName standard 
brain. Part of the formal definition of each term in the 
DAO for one of these brain regions is a formal link to 
the individual corresponding to the appropriate region of 
the standard brain. Figure 3A-C shows one such brain 
region, the lateral horn, and the axiom that links this 
individual brain region to the class 'lateral horn' in the 
ontology. 

The VFB project has generated lO's of thousand of OWL 
axioms recording the overlap of individual registered neu- 
rons to brain regions defined in this way. Figure 3D shows 
a neuron, registered to the standard brain, that overlaps 
the painted region defining the boundaries of the lateral 
horn (see figure legend for formalisation details). 



Current content 

As a result of the work described here, the DAO now con- 
sists of over 8000 classes and over 16000 logical axioms 
covering all aspects of Drosophila anatomy from the start 
of development through to mature adulthood. Through 
the work of the VFB project, neuro-anatomical content is 
particularly rich. Figure 4 shows the proportion of terms 
by system. 73% of classes have textual definitions, which 
in total reference almost 500 publications. As a result of 
refactoring, approximately half of over 10,000 classifica- 
tions in the pre-reasoned version of the ontology are now 
inferred rather than asserted. Extensive use of disjointness 
axioms provides error checking. 

Applications 

Semi-formalised annotation ofphenotype and expression 
using DAO 

The majority of annotation with the DAO records phe- 
notypes or expression patterns and consists of associa- 
tions between DAO terms, genetic features and terms 
from other ontologies stored in a relational database. 
While there is generally some level of informal agree- 
ment about the meaning of these associations, their for- 
mal semantics remain unspecified. This approach is used 
by FlyProt (http://www.flyprot.org/) and RedFly [24] and 
most extensively by FlyBase [3], which uses it in over 
260,000 annotations of phenotypes and expression pat- 
terns and to annotate over 1100 anatomical images. These 
resources consume a simple, pre-reasoned version of 
the ontology in OBO format (see Table 1), which they 
use to group annotations via combined class and part 
hierarchies. 

Formal annotation using DAO 

An alternative approach is to formalise annotations asser- 
tions in an OWL knowledge base. This allows the seman- 
tics of annotation to be more precisely specified and 
allows annotations to be classified and queried using an 
OWL reasoner. The resulting queries have more precisely 
specified semantics than is typical for SQL queries of 
annotations and the results lists are enriched via logical 
inferences that would not be available via SQL. 

VFB uses this approach to annotate 3D images of 
neuro anatomical structures. It currently has over 17000 
images of neurons, neuron clusters and clones annotated 
as OWL individuals. These are all typed using named 
classes and class expressions referencing the DAO. Typ- 
ing uses a mixture of manual annotation and automated 
annotation based on computation of voxel overlap to ref- 
erence regions in a standard brain to which all images are 
registered. Combining the resulting knowledge-base with 
the DAO and classifying using an OWL reasoner, auto- 
classifies many of neurons and integrates them into the 
VFB query system. 
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Figure 3 Defining witli images. (A) horizontal (B) frontal and (C) sagital sections through a standard reference Drosophilo brain in which the 
lateral horn region is highlighted in purple with boundaries defined by the BrainName standard^. We record a formal connection between an OWL 
individual representing the painted region and the class as: 'lateral horn' EquivalentTo has_reference_image value 'BrainName exemplar lateral 
horn'. (D) shows an image of single neuron that has been registered to the standard brain. Image analysis has determined overlap between the 
neuron and the region defined as lateral horn in the standard. Based on this, the neuron in the image has been annotated with the axiom: 
SubClassOf overlaps some 'lateral horn'. (Painting of standard brain by A.Jennet and K.Shinomiya. The single neuron image in panel C was derived 
from a neuron imaged by FlyCircuit [23] with registration and image analysis by MC and Gregory SXE Jefferis). 



The DAO as a queryable reference for Drosophilo anatomy 

So far, VFB is the only resource to use the DAO as a 
queryable reference for Drosophila anatomy. While VFB 
only provides access to neuroanatomical content of the 
DAO, it displays a complete set of associated informa- 
tion for all terms it displays, including a definition, com- 
ments and synonyms along with hyperlinked references, 
classifications and relationships. It also provides a set of 
queries of anatomy, images and annotations, tailored to 
the class of term being displayed. For example, users can 
run queries from brain region terms to find all the neu- 
ron classes that have synaptic terminals in that region. 
Both the choice of queries available for any given term 



and the queries themselves are specified using OWL-DL 
queries, run using the OWL reasoner ELK [9]. Annota- 
tions of expression and phenotype are pulled from FlyBase 
but use a pre-query in OWL that finds significantly more 
classes for grouping annotations than simple grouping via 
class and part hierarchies [5]. Queries for images use the 
DAO in combination with an OWL knowledge base. 

Support for using the DAO in annotation 

We maintain a google code site (https://code.google.com/ 
p/vfb- annotation- tools/) dedicated to templates for using 
the spreadsheet based annotation system. Populous [25] 
to annotate using the DAO. We chose Populous over 
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Figure 4 DAO content by system. Content of the DAO, divided by anatomical system. 



alternatives such as Phenote (http://www.phenote.org/) 
because a spreadsheet-based annotation system has al- 
most no overhead for non-expert users and because of its 
support for logically defining both input terms and output 
annotations in OWL. The resulting spreadsheets have a 
validation system, ensuring that users only annotate with 
valid term names. IDs are linked behind the scenes so 
that users do not need to track them. This can be used 
for simple informal tagging systems. It also provides the 
option of writing the resulting spreadsheets to OWL using 
templates specified via OPPL [26]. Doing so allows the 
semantics of annotation to be strictly specified and anno- 
tations to be queried in combination with the DAO using 
OWL-DL queries. 

Future directions 

There remains significant scope for useful refactoring of 
the DAO. 27% of classes remain undefined, with signifi- 
cant work remaining to complete definitions for the adult 
musculature and embryonic developmental anatomy. A 
number of classes still have only crude classifications, 
including over 40 only classified as anatomical entity! 
Refining the classification of crudely classified classes is a 
current priority, along with adding disjointness axioms to 
improve error checking. 

Other future work is likely to focus on improving the 
representation of stage and its relationship to parton- 
omy. There are no formal relationships between terms in 
the DAO and those in the Drosophila Stage Ontology. 
These would be useful for a number of purposes - not 
least in formalising the definition of phenotype and 
expression curation in FlyBase that combines these two 
ontologies. 



The DAO currently uses non-rigid classes [27] for struc- 
tures that persist between life stages. For example, we 
have the general term mushroom body (FBbt_00005801) 
with stage-specific subclasses defined using part relation- 
ships to terms for the whole animal during some life-cycle 
stage: 

adult mushroom body' EquivalentTo 'mushroom body' 
that part_of some adult 

A major problem with this approach is that it requires 
different partonomies for different stages, with continu- 
ity of classes of structure between different stages being 
represented using developmental relations (continuity of 
individuals between non-rigid classifications can not be 
represented in OWL). This causes problems for record- 
ing partonomy in anatomy that changes rapidly - as is the 
case with much developmental anatomy. Recent work on 
temporal indexing of part relations as part of work on ver- 
sion 2.0 of the Basic Formal Ontology (BFO) may provide 
at least partial solutions to this problem. 

Conclusions 

The addition of referenced textual definitions for over 
73% of terms in the DAO, along with extensive refac- 
toring to check for errors and infer multiple inheritance 
classification has dramatically improved the accuracy 
and usefulness of the DAO as a queryable reference 
for wild-type anatomy. This is well illustrated by its 
usage on the VFB site as a source of information about 
Drosophila neuroanatomy and as a driver of anatomi- 
cal queries. The addition of textual definitions has made 
it much easier for curators to rapidly find appropriate 
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terms for annotation with the DAO, while refactoring 
has dramatically improved the accuracy and complete- 
ness with which those annotations are grouped on FlyBase 
and other projects hosting DAO-based annotation. The 
VFB project again illustrates this well: prior to this work, 
the accuracy and completeness of the classification hier- 
archy for Drosophila neuro-anatomy was simply too poor 
to have been useful for driving VFB. 

Prior to refactoring, editing the ontology by hand was 
error prone and tended to result in incomplete classi- 
fications. Refactoring, along with the documentation of 
design patterns, has put future development of the ontol- 
ogy on a sustainable footing. The patterns of formalisation 
used in the DAO are likely to be useful to developers of 
other anatomy ontologies. 

The extensive use of terms from external ontologies 
in refactoring provides a potential basis for cross-species 
querying based, for example, on shared function or cell 
types. The links to external ontologies have already been 
used to experimentally incorporate the DAO into an 
extended version of the Uberon multi-species anatomy 
ontology [18]. 

Methods 

Please see Table 1 for details of how to access ontology ver- 
sions and documentation. This paper refers to the versions 
found in release 2013-07-26. 

The master version of the DAO has remained in OBO 
format throughout refactoring and formalisation. Dur- 
ing editing, the ontology was continuously converted 
to OWL and classified to test inference and consis- 
tency. We used Protege 4 along with the ELK reasoner 
to browse inferred classification and run test queries. 
We also used a continuous integration server (Jenkins) 
to add automated definitions, check syntax and consis- 
tency, and generate various flavours of derivative OBO 
and OWL following every commit to our version control 
respository. 

For every term from an external ontology used in a 
DAO axiom we import all terms and axioms on paths to 
root from a pre-reasoned version of the external ontology 
using OORT (https://code.google.eom/p/owltools/wiki/ 
Oortlntro). 

Conventions used in this paper: classes are referred to 
in free text using their label italics, followed by their OWL 
short form ID in brackets. Following the OBO foundry 
ID standard (http://www.obofoundry.org/id-policy.shtml) 
a full URI can be generated by prepending http://purl. 
obolibrary.org/obo/. In most cases this will URI will 
resolve to OntoBee (http://www.ontobee.org/), returning 
XML if accessed programatically. OBO IDs can be derived 
by converting the underscore in a short-form ID to a 
colon. All formal axioms are expressed in OWL Manch- 
ester syntax (OWL-MS) (http://www.w3.org/TR/owl2- 



manchester- syntax/). OWL-MS keywords are italicised. 
Note that that and and are interchangeable in OWL-MS. 
We choose which to use based entirely on readability. 
Object properties (relations) are in bold. The names of 
OWL entities (e.g. classes, object properties) are quoted 
only if they contain spaces. 

Endnotes 

^ We acknowledge that this aim is likely to be 
imperfectly realised in many cases, but our goal is simply 
an acceptably accurate reference source. 

^ This combination of equivalent class and subclass 
axioms constitutes a hidden general class inclusion 
axiom (GCI). It is expressed in this way, rather than as a 
separate GCI, for compliance with OBO format. 

^ Ito K, Shinomiya K, Armstrong J, Boyan G, 
Hartenstein S V Harzsch, Heisenberg M, Romberg U, 
Jenett A, Keshishian H, L R, Rossler W, Simpson J, 
Strausfeld N, Strauss R, Vosshall L: A Coordinated 
Nomenclature System for the Insect Brain* Submitted 
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